161 results found.
Speech/Written
Lexicon,
Language Type:
Multilingual
Languages:
Czech English Finnish French German Russian
Availability:
Freely Available
License:
Size:
206, 395 sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gongbo Tang | MuCow | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Not Applicable
Contextualsed word embeddings,
Language Type:
Monolingual
Languages:
Ancient Arabic Basque Bokmål Bulgarian Catalan Chinese Church Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Indonesian Irish Italian Japanese Korean Latin Latvian Norwegian Nynorsk Old Persian Polish Portuguese Romanian Russian Simplified Chinese Slavonic Slovak Slovene Spanish Swedish Turkish Ukrainian Urdu Uyghur Vietnamese
Availability:
Freely Available
License:
none
Size:
18.4 GByte Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Treebank Embedding Vectors for Out-of-domain Dependency Parsing
-
Paper track:Short/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joachim Wagner | Elmo For Many Languages | /N |
Documentation:
https://www.aclweb.org/anthology/K18-2005/
Written
Corpus,
Language Type:
Multilingual
Languages:
Chinese Czech English Finnish German Latvian Romanian Russian Turkish
Availability:
Freely Available
License:
Size:
3.9 MByte Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model
-
Paper track:Short/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Kosuke Takahashi | WMT18 metrics shared task data | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
English French German Portuguese Romanian Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
500 hours Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Adapting Transformer to End-to-End Spoken Language Translation
-
Paper track:12.1 Spoken machine translation/Oral Presentation
-
Paper status:Accept - Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mattia A. Di Gangi | MuST-C | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Bilingual
Languages:
Arabic Bengali Chinese English Hindi Korean Russian Thai and Urdu
Availability:
From Data Center(s)
License:
LDC
Size:
595 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2006 NIST Speaker Recognition Evaluation Training Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic English Mandarin Chinese Russian Spanish
Availability:
From Data Center(s)
License:
LDC
Size:
392 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2005 NIST Speaker Recognition Evaluation Training Data | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Egyptian Arabic English French German Hindi Iranian Persian Japanese Korean Mandarin Chinese Russian Spanish Tamil Vietnamese
Availability:
From Owner
License:
LDC
Size:
46 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2003 NIST Language Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari English German Hindi Iranian Persian Japanese Korean Mandarin Chinese Persian Russian Spanish Standard Arabic Tamil Thai Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None




